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Abstract 

Test developers are responsible to define how test scores should be interpreted and 
used. The No Child Left Behind Act of 2001 (NCLB] directed the Secretary of Education 
to use results from the National Assessment of Educational Progress (NAEP] to confirm 
the proficiency scores from state developed tests. There are two sets of federal 
definitions for the term "proficient,” one NAEP and one for NCLB. NAEP’s "At or Above 
Basic’’ is the most directly comparable statistic for confirming state proficiency results. 
NAEP and state proficiency scores, however, should be used (and interpreted] with 
caution. Achievement level results may provide useful trend information for one group 
on one test, but the statistical properties of proficiency scores render them ill-suited for 
trend comparisons. It may well be that there is no defensible, statistical method for 
using NAEP achievement level results to confirm a state’s proficiency scores. Until the 
federal law is amended proficiency score analyses it requires should be accompanied, 
whenever possible, by related analyses based on scale scores or effect size or both. 



Introduction 

This workshop session revisits a paper the author presented at the 2007 national 
conference on large-scale assessment entitled "An Explanation for the Large Differences 
between State and NAEP Proficiency Scores Reported for Reading in 2005” (Stoneberg, 
2007a]. The content for the beginning and middle of today’s presentation has much in 
common with that paper, but the ending is remarkably different. 
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Standards for Educational Testing 

The standards for educational and psychological testing - jointly established by the 
American Educational Research Association, the American Psychological Association, and 
the National Council for Measurement in Education - address the valid use of test scores 
(Joint Committee on Standards, 1999J. Standard 1.2, for example, says "the test developer 
should set forth clearly how test scores are intended to be interpreted and used.” Standard 
1.4 says "if a test is used in a way that has not been validated, it is incumbent on the user to 
justify the new use, collecting new evidence if necessary.” 

The National Assessment Governing Board (NAGBJ sets policy for NAEP while the 
National Center for Education Statistics (NCESJ implements it. NAGB and NCES together 
constitute the "test developer” for NAEP. The state is the "test developer” for the state test, 
but the state must abide by federal statute and regulation as guided by NCLB program 
officials in U.S. Department of Education. 

A New Use for NAEP 

Since its creation in 1969, NAEP has had two major goals: to assess student 
performance reflecting current educational and assessment practices, and to measure 
change in student performance reliably over time. To this end, NAEP has given careful 
attention to the standards for educational and psychological testing as established by the 
community of professionals engaged in educational research, measurement and evaluation, 
psychometrics, and statistics. 

The No Child Left Behind Act of 2001 (NCLBJ created a new use for NAEP by 
stipulating that "the Secretary shall use information from a variety of sources, including the 
National Assessment of Educational Progress [...], state evaluations, and other research 
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studies” to assess or evaluate the Title 1 program. The apparent motivation for requiring 
NAEP was to keep the states honest through external confirmation of the results that states 
reported for their NCLB tests. NAEP would serve this purpose well because all states would 
participate in NAEP and no state would have any control over the national assessment. 
Figure 1 illustrates the levels of scores available from the state NCLB test (i.e., student, 
school, district and state] and NAEP [state and national]. The challenge has been to come 
up with a defendable procedure to compare or match the state-level results from the NCLB 
test and NAEP. 

Figure 1. Levels of results reported for the NCLB (state) test and 
NAEP. The challenge is to match state level results from both tests. 



Evaluation of Title I Program 




NCLB required a state to develop its assessment so it could report on "two levels of 
high achievement [proficient and advanced] that determine how well children are 
mastering the material in the State academic content standards.” NCLB placed focus on 
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reporting out the percentage of students scoring at or above proficient on the state test. 
This statistic is known as AYP or Adequate Yearly Progress. It is unfortunate, but the 
narrow focus on the use of achievement level results in NCLB rendered the use of NAEP 
achievement level scores to confirm state AYP reports unavoidable. NCLB and NAEP use 
the same "names” for the various achievement levels (i.e., basic, proficient, and advanced], 
but the NCLB-mandated state tests and NAEP operate under different definitions for each 
achievement level name. It is a mistake to assume that proficient is proficient is proficient. 
NCLB’s state proficient is not the same as NAEP Proficient. 

NCLB Achievement Levels: Interpretation and Use 

The U.S. Department of Education implemented a peer review process to provide 
federal oversight as states developed their NCLB tests. A peer review team made up of out- 
of-state persons with expert knowledge and skills in curriculum and assessment visited the 
program. The team filled out an extensive review checklist while on-site, and issued a 
report with findings and recommendations. The Title I programs in some states were fined 
because they did not corrective action sufficient to "pass” peer review on subsequent visits. 

The peer review team was required to examine the state's definitions for the 
achievement levels. In particular, the team had to pass judgment on the state’s definition of 
proficient. It had to mark Yes or No on the checklist whether "The ‘proficient’ achievement 
level represents the attainment of grade-level expectations for that academic content area.” 
(U.S. Department of Education, 2004]. 

It’s noteworthy here that before NCLB some state testing programs used out-of-level 
testing for students whose instructional levels were either below or above their grade level. 
This practice, however, did not survive the peer review process. It was made clear that 
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NCLB required state tests to measure achievement of grade-level content and to be 
administered to students at that grade. "On-grade-level, at-grade-level" became the mantra 
for state tests under NCLB. 

While preparing for the reauthorization battle over the No Child Left Behind Act, the 
U.S. Department of Education published its blueprint for strengthening the law. It said, "We 
remain committed to ensuring that all students can read and do math at grade level or 
better by 2014. This is the basic purpose and mission of the No Child Left Behind Act.” [U.S. 
Department of Education, 2007]. Substituting achievement level names specified in NCLB, 
this might be taken to mean that the intent of the law was to ensure that all students could 
read and do math at the proficient level or the advanced level. Indeed, the Department’s 
blueprint made it clear that state "proficient" means "at grade level" and that advanced 
means "better than grade level." 

NAEP Achievement Levels: Interpretation and Use 

The National Assessment Governing Board [NAGB] has not been silent about the 
interpretation and use of NAEP achievement level scores. It published achievement level 
reports to explain its interpretation of achievement level scores. The Board convened an Ad 
Hoc committee to study how NAEP might be used to confirm state test results, and received 
reports from the NAEP Validity Studies Panel. It has also published a framework for each 
assessment that expands upon the policy definitions of Basic, Proficient, and Advanced. 

Achievement Level Reports . In 2001, as Bush and Kennedy lead the passage of the 
No Child Left Behind Act, NAGB published a series of booklets to inform the public about 
the interpretation and use of NAEP scores. Text from the reading booklet [identical 
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language is also found in the booklets for writing, mathematics, science, U.S. history, 
geography, and civics] reads: 

Notice that there is no mention of "at grade level" performance in these 
achievement goals. In particular, it is important to understand clearly that 
the Proficient achievement level does not refer to "at grade” performance. 

Nor is performance at the Proficient level synonymous with "proficiency” 
in the subject. That is, students who may be considered proficient in a 
subject, given the common usage of the term, might not satisfy the 
requirements for performance at the NAEP achievement level. Further, 

Basic achievement is more than minimal competency. Basic achievement is 
less than mastery but more than the lowest level of performance on NAEP. 

Finally, even the best students you know may not meet the requirements 
for Advanced performance on NAEP. (Loomis & Bourque, 2001]. 

Ad Hoc Committee Report . In 2002, the Board’s Ad Hoc Committee on Confirming 
Test results issued its report. The committee's work did not examine (i.e., compare or 
contrast] the differing interpretations and uses that NCLB and NAEP had stipulated 
regarding the achievement levels, whether basic, proficient, or advanced. The report, 
however, did contain several important findings. Three key findings to consider. First, 
NAEP can be used as evidence to confirm the general trend of state test results in grades 4 
and 8 reading and mathematics. Second, confirmation of state AYP results should NOT be 
conducted on a point-by-point basis. Third, when confirming state AYP results, differences 
between NAEP and the state testing program must be explored and reported. (Ad Hoc 
Committee, 2002]. 

NAEP Validity Studies Panel . In 2004, the NAEP Validity Studies Panel issued a 
report for a statistical analysis that concluded "NAEP’s ‘percent At or Above Basic' is the 
most directly comparable statistic for confirming state AYP results.” (Mosquin & Chromy, 
2004]. When results from NAEP 2005 were released, the percent at or above Basic was 
given prominence in some reports for the first time ever. The NAEP reports were prepared 
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by the National Center for Education Statistics and released by the National Assessment 
Governing Board. This change in reporting practice indicated that both parties accepted the 
NAEP Validity Studies Panel’s findings as consistent with existing NAEP policy and practice. 

NAEP Frameworks . NAEP frameworks are not curriculum documents that express 
what students should be learning in America’s schools. They are a description of what will 
be tested and how the scores should be interpreted. The framework for each subject 
expands NAEP’s policy definitions of Basic, Proficient, and Advanced. 

The policy definition notes that Basic "denotes partial mastery of prerequisite 
knowledge and skills that are fundamental for proficient work at each grade." Language 
from the framework for the 2007 reading assessment clarifies "prerequisite knowledge and 
skills" for Basic at the fourth grade. Fourth-grade students performing at the Basic level 
should demonstrate an understanding of the overall meaning of what they read. When 
reading text appropriate for fourth graders, they should be able to make relatively obvious 
connections between the text and their own experiences and extend the ideas in the text by 
making simple inferences. (National Assessment Governing Board, 2006). 

Language from the framework for the NAEP 2009 reading assessment clarifies 
"prerequisite knowledge and skills" for Proficient. "Proficient readers," it says, "will have 
sizeable meaning vocabularies, including knowledge of many words and terms above grade 
level." (National Assessment Governing Board, 2008). This contrasts with NCLB’s "on- 
grade-level, at-grade-level" yoke. 

Clearly, this language from the NAEP reading frameworks indicates that NAEP Basic 
represents an estimate of "grade-level expectations," and that NAEP Proficient demands 
some above-grade-level knowledge and skills. Once again, NCLB requires a state to define 
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proficient as meeting grade-level expectations on state content. There can be no doubt that 
using a state’s NAEP Proficient score to confirm a state’s NCLB proficient score would 
surely result in mistaken and misleading conclusions. This, however, has been the prime 
methodology over the last half-decade for NAEP-state proficiency analyses conducted and 
published by national foundations, institutes, and think tanks. 

NAEP Achievement Levels and ‘‘Letter-Grades” 

One way to understand the NAEP achievement levels is to link NAEP’s descriptive 
language to letter grades [i.e.. A, B, etc.] that one would likely see on the report cards of 
students performing at each NAEP achievement level [Stoneberg, 2007b]. Figure 2 
compares the language used to describe NAEP achievement level scores and "letter grades” 
used to describe corresponding classroom performance levels. The language describing 
NAEP Basic corresponds letter grades ranging from C- to B, which represents meeting 
grade-level expectations for that particular grade. 



Figure 2. Comparing language used to describe the NAEP achievement ieveis and 
“ietter grades” used to describe corresponding classroom performance. 
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In 2007, NCES published a statistical analysis report finding that "A majority [56 %] 

of Proficient and above performers on the 1992 NAEP-scaled mathematics assessment 

maintained an "A” average in mathematics throughout high school. Some 20 percent of "B” 

students and 5 percent of "C” students reached the proficient or advanced levels.” [Scott & 

Ingles, 2007). Figure 3 presents these results graphically. The interpretation of these 

results are muddled somewhat because not all students take the same mathematics course 

in high school. One student may have an A average in two courses [e.g., general math and 

consumer math, really 8* grade arithmetic a second time and a third time), while another 

student may have an A average in four rigorous courses including AP Statistics and Math 

Analysis. The latter will likely reach Proficient on NAEP, while the former probably will not. 

However, a student with a C average through four rigorous mathematics courses may still 

reach the NAEP Proficient level. In general, these results leave the impression that NAEP 

Proficient requires a performance that is higher than just meeting grade-level expectations. 

Figure 3. The percentage of high schooi seniors by mathematics GPA who scored 
“At or Above Proficient” on a 1992 NAEP-scaled mathematics assessment. 
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Use Achievement Level Scores with Caution 



Congress has mandated external evaluations of NAEP, the most recent of which by 

the National Academy of Sciences [Pellegrino, lones & Mitchell, 1998]. The Academy found 

that NAEP’s procedure for setting cut-scores was fundamentally flawed because it rested 

on "informed judgment" rather than a "highly objective process," and noted that the 

process had produced some unreasonable results. Even though its report was highly 

critical of NAEP's achievement levels, the Academy did recommend their cautious use for 

drawing attention to changes in student performance over time. 

NAEP’s current achievement levels should continue to be used on a 
developmental basis only. If achievement-level results continue to be 
reported for future [...] the reports should strongly and clearly emphasize 
that achievement levels are still under development, and should be 
interpreted and used with caution. Reports should focus on the change, 
from one administration of the assessment to the next, in the percentages 
of students in each of the categories determined by the existing 
achievement-level cutscores [...] rather than focusing on the percentages in 
each category in a single year. [Pellegrino, Jones & Mitchell, 1998]. 

In NCLB, Congress required the Secretary to use NAEP data to evaluate the Title 1 
program, but NCLB also required that NAEP achievement levels be used on a trial basis 
until the Commissioner of Education Statistics determines that the achievement levels are 
"reasonable, valid, and informative to the public." Until that determination is made, the law 
requires the Commissioner and the National Assessment Governing Board to state clearly 
the trial status of the achievement levels in all NAEP reports. 

The website for the "Nation’s Report Card" notes that, "The Board and NCES believe 
that the achievement levels are useful for reporting trends in the educational achievement 
of students in the United States. However, [...] NCES concludes that these achievement 
levels should continue to be used on a trial basis and should continue to be interpreted and 
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used with caution." The Board and NCES also note on the website about the Nations Report 
Card that a proven alternative to the current process of setting cut-scores has not yet been 
identified. They invite organizations and individuals with ideas for alternative models for 
setting cut-scores to present them for consideration. 

How Might a Confirming Analysis Be Done? Then. 

Given the interpretation and status of NAEP achievement levels and the stated 
purpose of the national assessment, it seemed in 2007 to the author that graphing trend 
lines plotting state percent at or above proficient and NAEP percent At or Above Basic side- 
by-side together offered a defendable method for confirming state AYP results. If the trend 
lines moved in the same direction, it indicated that NAEP confirmed the state results. At 
least, this was the notion advanced in the author’s paper presented at the CCSSO 37th 
Annual National Conference on Large-Scale Assessment in Nashville. Quote, complete with 
graphic [Stoneberg, 2007a]: 

Figure "A" illustrates how NAEP might be used to confirm 
state testing results (Carr, 2002). It's a useful graphic for 
bringing together the points discussed in this paper. By 
comparing NAEP's percent at or above Basic to the state's 
percent at or above grade level (i.e., at or above proficient, in 
NCLB terms), the confirming analysis in Figure "A" 
recognizes that NAEP's definition of Proficient is not 
synonymous with grade-level proficiency in a subject. The 
different fill colors suggest differences between the two tests, 
which should be discussed in a narrative accompanying the 
graph. Moreover, the graph avoids point-by-point 
comparisons between NAEP and state achievement levels. 

Rather, it relies on the comparison of proficiency trend lines, a 
defendable method for using NAEP to confirm state AYP 
results. 
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Figure “A”. Graphic illustration of how NAEP percent at or above Basic 
might be used to confirm state test results in the No Child Left Behind 
Act of 2001 
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How Might a Confirming Analysis Be Done? Now! 

The defendable method for conducting a NAEP-state confirming analysis by 
comparing their achievement level trends that the author presented at the Large-Scale 
Assessment Conference in Nashville in June 2007 was essentially rendered indefensible in 
December 2007. 

In its 1998 evaluation report, the National Academy of Sciences did recommend that 
NAEP achievement level results might be used [with caution] to plot a performance trend 
for a group. Under the blanket of the Academy’s recommendation, it had been assumed 
generally that proficiency data from NAEP and NCLB tests enjoyed the requisite statistical 
properties for sound trend comparisons. An unexamined assumption! 
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A pivotal study by Andrew Ho (University of Iowa] that compared NAEP and state 
proficiency trend data, however, disputed the assumption. "Trend comparisons require 
both technical care and substantive consideration. As useful as PAG [percent above cut- 
score] statistics have been in communicating test results to the public, their properties as 
trend statistics render them ill-suited for trend comparison” (Ho, 2007]. 

Dr. Ho presented two sessions at this workshop yesterday related to proficiency 
standards and defensible methods for making NAEP-state comparisons. Four points from 
his presentations were particularly noteworthy: 

• For NAEP-State comparisons, we need to get past proficiency standards. 

• The proficiency metric distorts just about every important large-scale test-driven 
inference. 

• Trend and gap interpretations can be inflated or deflated by cut-score location. 

• High-stakes trends, gaps, and gap trends should all be reported on a scale-score or 
effect-size metric. (Ho, 2009]. 

The need to change metrics that Ho has advanced seems both credible and 
desirable. NAEP-state comparisons based on achievement level scores are indeed per se 
faulty. Unfortunately, the language in NCLB requires the Secretary to use the proficiency 
metric for NAEP-state comparisons. So until the federal law is changed, any analysis based 
on the proficient metric that might be required by NCLB should, whenever possible, be 
associated with and accompanied by a related analysis based on scale-scores or on effect 
sizes or on both scale scores and effect sizes. 



# # # 
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Standards for educational 
and psychological testing 
were updated in 1999 by a 
joint effort of the 

^American Educational 
Research Association 

^American Psychological 
Association 

^National Council on 
Measurement in Education. 
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Standards: Valid Use of Test Scores 

^Standard 1.2. The test developer should 
set forth clearly how test scores are 
intended to be interpreted and used. 

^ Standard 1.4. If a test is used in a way 
that has not been validated, it is incumbent 
on the user to justify the new use, 
collecting new evidence if necessary. 



Joint Committee on Standards for Educational and Psychological Testing of the American 
Educational Research Association, the American Psychological Association, and the National 
Council on Measurement in Education. Standards for educational and psychological Testing. 
Washington, D.C.: American Educational Research Association, 1999. 
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Note 4! Natkmal Assessmant of Educational Progress^ (WACt*) 



The National Assessment of Editcational Progress (fSLAEP)^ governed by the National 
Assessment Governing Board (NAGB), is administered regularly m a number ot academic 
Subjects, since its -cr^atjon in 1960^ NAEP has had two major g&als: to assess student 
performance reflecting -current educahonal and assessmecift practices and to measure change 
in student performance reliably over time . To address these goals, NAEP includes a main 
assessment and a long-term trend assessment. The two assessments are administered to 
separate samples of sty dents at separate times, use separate instruments, and measure 
different educational content. ThuSp results from the two assessments should not be 
compare d. 

Since its creation in 1969, NAEP has had two major goals: to assess 
student performance reflecting current educational and assessment 
practices, and to measure change in student performance reliably 
over time. 



badi[ground questionnairos [for the studoritr teachorr and school) to provide infomiatiDn on 
instructional experiences and the school environment at each grade. 




Public Law 107—110 
107th Congress 



An Act 



To dose die achievement gap with nccoimtability, flexibility, and choice, so that Jan, &, 2002 

no child is left behind. [H,R, 1] 

Be it enacted by the Senate and House of Representatives of 
the United States of America in Congress assembled^ 

SECTION U .SHORT TITLE, 

This title mav be cited as the **No Child Left Behind Act 
of 2001". 

‘^PART E— NATIONAL ASSESSMENT OF TITLE I 

so use 64&1. 1301* E\"ALUATTON8. 

‘■‘(a) National Assessment op Title I. — 

“{1) In general. — ^The Secretary sliaJi conduct a national 
assessment of the programs assisted under this title and the 
impact of this title on States, local educational agencies, schools, 
and students. 

"(3) Sources of information. — L i conducting the assess- 
ment under this subsection, the Secretary shall use information 
fipm a variety of sources, including the JNational Assessment 
of Educational Progress (carried out under section 411 of the 
National Education Statistics Act of 1994), State evaluations, 
and other research studies. 



Evaluation of Title I Program 



NCLB n + NAEP 




No Child Left Behind Act of 2001 



Standards under this paragraph shall... 

“(II) describe two levels of high 
achievement (proficient and advanced) 
that determine how well children are 
mastering the material in the State 
academic content standards;’’ 



Interpretation and Use of NCLB Achievement Levels 

Office of Elementary and Secondary Education 

The ^^proficient” achievement level represents the 
attainment of grade-level expectations for that 
academic content area. 

Standards and Assessments Peer Review Guidance: Information and Examples for Meeting 
Requirements of the No Child Left Behind Act of 2001. Washington, D.C.: U.S. Department of Education, 
2004. 

Secretary of Education 

We remain committed to ensuring that all students can 
read and do math at grade level or better by 2014. This 
is the basic purpose and mission of the No Child Left 
Behind Act. 



Building on Results: A Blueprint for Strengthening the No Child Left Behind Act. Washington, D.C.: 
U.S. Department of Education, 2007. 
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Interpretation and Use of NAEP Achievement Levels 
National Assessment Governing Board (NAGB) 



NAEP Achievement Level Reports (2001) 
NAGB Ad Hoc Committee Report (2002) 
NAEP Validity Studies Panel Report (2004) 



NAEP Frameworks 




Achievement 
Levels Report 

Reading 



Writing, Mathematics, 
Science, U.S. History, 
Geography, and Civics 



Loomis, S.C., and Bourque, M.L. (Eds.). (2001). 
National Assessment of Educational Progress 
Achievement Levels, 1992-1998 for Reading. 
Washington, DC: National Assessment Governing 
Board, U.S. Department of Education. 
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How Should Achievement Levels Be Interpreted? 

Notice that there is no mention of ‘‘at grade level” performance in 
these achievement goals. In particular, it is important to 
understand clearly that the Proficient achievement level does not 
refer to “at grade” performance. 

Nor is performance at the Proficient level synonymous with 
“proficiency” in the subject. That is, students who may be 
considered proficient in a subject, given the common usage of the 
term, might not satisfy the requirements for performance at the 
NAEP achievement level. 

Further, Basic achievement is more than minimal competency. 

Finally, even the best students you know may not meet the 
requirements for Advanced performance on NAFP. 
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National Assessment Governing Board 

National Assessment of Educational Progress 



Using the National Assessment of Educational Progress 
To Confirm State Test Results 



A Report of 

The Ad Hoc Committee on Confirming Test Results 



March 1, 2002 



Ad Hoc Committee on Confirming Test Results 

Michael Nettles, Chair 

Daniel Domenech 

Edward Haertel 

Nancy Kopp 

Debra Paulson 

Diane Ravitch 

Michael Ward 

Marilyn Whirry 

Dennie Palmer Wolf 

Planning Work Group 
Mark Reckase, Chair 
Peter Behuniak 
David Francis 
Paul Holland 
Scott Jenkins 
Mary Jean LeTendre 
Gerry Shelton 
Wendy Yen 

Governing Board Staff 
Ray Fields 



NAEP can be used as 
evidence to confirm the 
general trend of state test 
results in grades 4 and 8 
reading and mathematics. 

Confirmation of state AYP 
results should NOT be 
conducted on a point-by- 
point basis. 

When confirming state 
AYP results, differences 
between NAEP and the 
state testing program must 
be explored and reported. 
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Differences must be explored and reported . . . 

‘Totential differences between NAEP and state testing programs 
include: content coverage in the subjects, definitions of subgroups, 
changes in the demography within a state over time, sampling 
procedures, standard-setting approaches, reporting metrics, student 
motivation in taking the state test versus taking NAEP, mix of item 
formats, test difficulty, etc. Such differences may be minimal or 
great in number and in size and cannot reasonably be expected to 
operate in all states in equal fashion.” 

Ad Hoc Committee on Confirming Test Results. Using the National Assessment of Educational 
Progress to confirm state test results. Washington, D.C.: National Assessment Governing Board, 2002. 
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Federal Sample Sizes for 
Confirmation of State Tests in the 
No Child Left Behind Act 



Paul Mosquin 
RTI International 

James Chromy 
RTI International 



Commissioned by the NAEP Validity Studies (NVS) Panel 
May 2004 

George W. Bohrnsted% Panel Chair 
Frances B. Stancavage, Project Director 



The NAEP Validity Studies Panel was formed by the American Institutes for Research 
under contract with the National Center for education Statistics. Points of view or 
opinions expressed in this paper do not necessarily represent the official positions of the 
U.S. Department of Education or the American Institutes for Research. 



NAEP’s “percent At 
or Above Basic** is 
the most directly 
comparable statistic 
for confirming state 
AYP results. 



Mosquin, P., and Chromy J. (2004). Federal 
sample sizes for confirmation of state tests 
in the No Child Left Behind Act. 
Washington, D.C.: American Institutes for 
Research, NAEP Validity Studies Panel. 
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student Percentage at NAEP Achievement Levels 



Student Percentage at NAEP Achievement Levels 



Idab(Pubtf) 

iw" 
im 
2m 

NlIk^ifFubtlr) 

2m 




47 

~w 






4S 



ir 1 1 






J2 



39 



Pttifetrnpt Iwlflw jtfiifc and qt PfttwtfflBfr tfl Ffoffmt mi 

AdruKiif 

Basic ntask MAdvm^ 

" iuantiutjiitciii w&ftnoi p«rmlttd far rhK (pitimmi. 



NQTE^ Tb NAIF hUllieiiDDu stuk rnijea liMrt Q 1ti SH wllti lb ddileviiiieiiil l^ib 
[<iffes|nittdi[ig \a iblnlkwluj (suiitr;: 215 “cif Iw^; Bu*K ^14-248? 

iJranttfi 232 or ilnvo. 



Idaho (public) 
1992^ ■ 

2000 "I 
2000 I 
2003 
2005 

Nalton I'putitic) 
2Q0& 




47 



44 



Mg* ir 

I 20' ir 

I 19* 1 1* 

I ,£s:, J2' 

I 36 Ms 



Percent tsalny.i Basic PefcenL 3t Sasic Prafcre^r and 
\2^e\€M Bnsic □ BosJc □ Rro/jcjen/ B/ldvaf?ced 



^ Accommods'icirtE were na1 perm tied for fhi& SEEBEement. 

NOTE: The NAEP rrathematics achievement evels CDrreEpnnd to the follov^ing 
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IdaLio Snapshot Report 
Mathematics 2003, Gr 4 



Idalio Siiai:>shot Rei^ort 
Mathematics 2005, Gr 4 



Note: In some NCES prepared reports with results 
from NAEP 2005, the percent at or above Basic was 
given prominence for the first time. This change in 
reporting practice is in harmony with the NAEP 
Validity Studies Panel’s recommendations. 
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Reading Framework for the 
2007 National Assessment 
of Educational Progress 




NatknaJ Assessment Govemlnjt Board 

UySL llepaEtneiit of Educatbi 



Fourth-grade students 
performing at the Basic level 
should demonstrate an 
understanding of the overall 
meaning of what they read. 
When reading text 
appropriate for fourth 
graders, they should be able 
to make relatively obvious 
connections between the text 
and their own experiences 
and extend the ideas in the 
text by making simple 
inferences. 



Reading Framework for 2007 
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Reading Framework 
for the 2009 
National Assessment 
of Educational Progress 




NatJona] Assessment Governing Board 
U.S. Departnient of Education 



‘‘Proficient readers will have 
sizeable meaning 
vocabularies, including 
knowledge of many words 
and terms above grade level.” 

National Assessment Governing Board. 
(2008). Reading Framework for the 2009 
National Assessment of Educational 
Progress. Washington, D.C.: U.S. 
Department of Education, National Center 
for Education Statistics. 
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‘‘Letter Grades” for NAEP Achievement Levels 



Achievement Level 


NAEP Achievement Level Descriptors 


Letter 

Grade 

Range 


Advanced 




A+ 




Some of the best students you know 


A 


Proficient 


Many words and terms above grade level 


1 




Mastery of challenging content 


B+ 




Proficiency in subject (common meaning) 


B 


Basic 


Overall understanding of grade-appropriate text 


1 




More than minimal competency 


C- 






D+ 


Below Basic 


Minimally competent 


F 



Stoneberg, B.D. (2007). Using NAEP to Confirm State Test Results in the No Child 
Left Behind Act. Practical Assessment Research & Evaluation, 12(5). Available online: 
http : / / WWW . pareonline .net/pdf/vl2n5.pdf 
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* IC'CHUIOtlAi'ahnEllhnt 
V ■wWEDtKATKMn'ATIIillCl 

|P'Mlivl+ #4 i4«*iiil4ii| + 

Interpreting 12th-Graders’ NAEP-Scaled 
Mathematics Performance Using High School 
Predictors and Postsecondary Outcomes 
From the National Education Longitudinal 
Study of 1988 (NELS:88) 

Statistical Analysis Report 




A majority (56 %) of 
Proficient and above 
performers on the 1992 
NAEP-scaled mathematics 
assessment maintained an 
‘‘A” average in 
mathematics throughout 
high school. Some 20 
percent of students and 
5 percent of ‘‘C” students 
reached the proficient or 
advanced levels. 



Scott, L.A., and Ingels, S.J. (2007). Interpreting 12th-graders’ NAEP-scaled mathematics performance 
using high school predictors and postsecondary outcomes from the National Education Longitudinal Study 
of 1988 (NELS:88) (NCES 2007-328). Washington, D.C.: National Center for Education Statistics, U.S. 
Department of Education. 
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The Percentage of High School Seniors by 
Mathematics GPA Who Scored "At or Above 
Proficient" on a 1992 NAEP-Scaled 
Mathematics Assessment 
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Source: NCES 2007-328 
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UFlADiniQ 
THE liATIOH'S 

Report Card 



Evaluating NAEP 
and Xransforniting 
the Assessment of 
Educational Progress 



V ocwrea<EP^i 
O^’mM waoTM-.im twp 

•fDrwdim r<-L^>t^^Qfu^aad 

□ nBg#3.hnrtMF EUH ^ 
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NAEP’s current achievement levels 
should continue to be used on a 
developmental basis only . If 
achievement-level results continue to 
be reported for future... the reports 
should strongly and clearly emphasize 
that achievement levels are still under 
development, and should be 
interpreted and used with caution . 
Reports should focus on the change, 
from one administration of the 
assessment to the next , in the 
percentages of students in each of the 
categories determined by the existing 
achievement-level cutscores. . .rather 
than focusing on the percentages in 
each category in a single year. 

Pellegrino, J.W., Jones, L.R., and Mitchell, K.J. (Eds.). Grading the Nation’s 
Report Card: Evaluating NAEP and transforming the assessment of 
educational progress. Washington, DC: National Academy Press, 1998. 
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THE NATION'S REPORT CARD 

National Assessment of Educational Progress 



The 2001 reauthorization iaw requires that the achievement ieveis be 
used on a triai basis untii the Commissioner of Education Statistics 
determines that the achievement ieveis are "reasonabie, vaiid, and 
informative to the pubiic" (P.L. 1 07-1 10, 115 Stat. 1 425 [2002]). Untii that 
determination is made, the iaw requires the Commissioner and the 
Nationai Assessment Governing Board to state cieariy the triai status of 
the achievement ieveis in aii NAEP reports. 

A proven aiternative to the current process has not yet been identified. 

The Board and NCES beiieve that the achievement ieveis are usefui for 
reporting trends in the educationai achievement of students in the 
United States. However, [...] NCES conciudes that these achievement 
ieveis shouid continue to be used on a triai basis and shouid continue 
to be interpreted and used with caution. 

See http://nces.ed.gov/nationsreportcard/achlevdev.asp ?id=re 
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Carr, P.G. (2002, August). Legislative and Policy Update. PowerPoint presentation at the 
National Assessment of Educational Progress (NAEP) State Coordinator Two-day 
Orientation of the NAEP State Service Center, Washington, D.C. 
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Reliability as Arguincni 
^arkes 

^ Discrepancies Between Score Trend^ 
from NAEP and State Tests; 

A Seale-Invariant Perspective 
Andre\i' D. Ho i 



Subscorcs Based on C 
Theor>': To Report or Not to Report 
SarnUp Sinharay. Shelby Haherman. 

and GauUm Puhan 



Validity Issues in Test Speededness 
P/Mg Lit and Stephen G. Sired 

An NCME Instructional Module 
on Estimating Item Response 
Theory Models Using Markov 
Chain Monte Carlo Methods 
Jee-Seon Kim and Daniel M. Bolt 



2008 NCME Annual Meeting Ad 
2008 NCME Election Slate 




national 
council on 
measurement 
'^■fn education 



^^Trend comparisons 
require both technical 
care and substantive 
consideration. As 
useful as PAC [percent 
above cut-score] 
statistics have been in 
communicating test 
results to the public , 
their properties as 
trend statistics render 
them ill-suited for 
trend comparison.” 

Ho, A.D. (2007). Discrepancies between 
score trends from NAEP and state tests: A 
scale-invariant perspective. Educational 
Measurement: Issues and Practice, 26(4), 

pp. 11-20. 
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Act Slofieberg. 3erl D. 
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□oundl of Chief State S diool OfficerB 
37th Annual National Conferenoe on Large-Soale ABBeBBmant 
Nadivillg Tennessee 
June 19, SOOT 

An Esplanaticoi for tlie large DiBfepeEices betYreen State and 
NAEP “PpQfi.GLen(^ Scores R^orted for Reading in S005 

Eant D. Stoneberg 
NAEP State Coardinator 
Idaho State Board of Education 

AbEtirait 

The JlTcJ Ghsid (ITCLB) permits the Secretary of Education to use 

ITAEF achievement level scores, in concert with other data, to confirm state testing 
results. TheTJ. S. Department of Education has not yet published a guidance document 
describing how ITAEF might be used appropriately. A review of the literature from the 



Stoneberg, B.D. (2007, June). An 
explanation for the large differences 
between state and NAEP "proficiency" 
scores reported for reading in 2005. 
Paper presented at the Chief State 
School Officers (CCSSO) National 
Conference on Large-Scale Assessment, 
Nashville, TN. Available online: 
http://www.eric.ed.gOv/contentdelivery/s 
ervlet/ERICServlet?accno=ED497395 



Guest Editorial: Martin Harris on the NAEP 

httpc /Af ttradcon. wordpres&CDrTV2008 ^9/2S/guest-editori al-martin-harris-on-lhe-na ep/ 

A Fuio^ Heppenedon ^ Way io ^ Forum (IhrtI oflV^ 

My literary betters [ ] can plagiarize fermoreskillfiilly than I, andsomytheflofthemovietitle 
“A F unny Thing...” toheadttis column refers not to the Roman F orum but rather to a 
colloquium of educators in -'wiiere else — Nashville [ ]. The official educator foruminthe 
V olunteer State in mid- June of last year went unpublicized [ ] and I knew nothing ofit until a 
few short weeks ago when I contacted the US Department of Education for an ejjplanation of a 
puzzling subj ect in public education: the substantial discrepancy in student achievement test 
scores between the federal NAEP tests and all the State-preferred local tests. 

[] Until recently, no one paid much attention to this National Assessmerl of Educational 
Progress (NAEP) even though the resulting student test scores were uniformly quite dismal, [ ] 
about 2/3 of all test-takers couldn’t make “proficient” [ ] and couldn’t, therefore, function at 
grade level. [ ] Such an intractable pmblem calls for a conference; or, if you prefer, a Forum. 

And then a funny thing happened on the way to the F orum (or maybe once there, Quisnam 
teneo? Who know^?) a solution to the problem was discovered, or created, or invented. It can be 
found on pages 8 and 9 of the conference - oops, Forum — report, “An Ej^lanati on for the Large 
Differences B etween State and NAEP “Proficiency” Scores Reported for Reading in 2005”. 
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Common (but False) Assumptions 

► American public schools are a dismal failure ... 

► Proficient is proficient is proficient ... 

► A test is a test is a test ... 

► Everyone is entitled to his or her own belief 
about how to interpret and use a test score ... 
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Advanced 


This ievel signifies superior performance. 


Proficient 


This ievel represents solid academic performance for each 
grade assessed. Students reaching this level have 
demonstrated competency over challenging subject matter, 
including subject-matter knowledge, application of such 
knowledge to real-world situations, and analytical skills 
appropriate to the subject matter. 


Basic 


This level denotes partial mastery of prerequisite knowledge 
and skills that are fundamental for proficient work at each 
grade. 



Basic denotes partial mastery of prerequisite knowledge and skills that are 
fundamental for proficient work. 

A grade of C- to B denotes partial mastery of prerequisite knowledge and 
skills that are fundamental for B+ to A work. 
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